-
Notifications
You must be signed in to change notification settings - Fork 5.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fluid benchmark support recordio reader #11121
Fluid benchmark support recordio reader #11121
Conversation
label = fluid.layers.data(name='label', shape=[1], dtype='int64') | ||
if args.use_reader_op: | ||
filelist = [ | ||
os.path.join(args.data_path, f) for f in os.listdir(args.data_path) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can use glob
to specify the files.
benchmark/fluid/README.md
Outdated
and batch_size you choose: | ||
|
||
```bash | ||
python -c 'from recordio_converter import *; prepare_mnist("data", 32)' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's better to set batch_size=1
, we can set the batch_size
in the trainer reader.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
… fluid_benchmark_support_recordioreader
benchmark/fluid/fluid_benchmark.py
Outdated
|
||
iters, num_samples, start_time = 0, 0, time.time() | ||
for pass_id in range(args.pass_num): | ||
train_losses = [] | ||
for batch_id, data in enumerate(train_reader()): | ||
reader_generator = train_reader() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
reader_generator = train_reader()
==>
if not args.use_reader_op:
reader_generator = train_reader()
benchmark/fluid/fluid_benchmark.py
Outdated
num_samples += len(data) | ||
batch_id += 1 | ||
# FIXME(wuyi): last batch size maybe different | ||
num_samples += len(args.batch_size) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For use_reader_op
, if the current pass is not the last, the last batch of this pass is also equal to args.batch_size
.
benchmark/fluid/fluid_benchmark.py
Outdated
for pass_id in range(args.pass_num): | ||
num_samples = 0 | ||
iters = 0 | ||
start_time = time.time() | ||
for batch_id, data in enumerate(train_reader()): | ||
reader_generator = train_reader() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
reader_generator = train_reader()
==>
if not args.use_reader_op:
reader_generator = train_reader()
benchmark/fluid/models/mnist.py
Outdated
thread_num=args.gpus) | ||
data_file = fluid.layers.double_buffer( | ||
fluid.layers.batch( | ||
data_file, batch_size=args.batch_size)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For use_reader_op
, the batch_size
of fluid.layers.batch
is set with a single card, this is to say if the batch size is 256 when training Vgg, and the machine has 4 cards, the batch_size
for fluid.layers.batch
should be 64.
benchmark/fluid/models/vgg.py
Outdated
thread_num=args.gpus) | ||
data_file = fluid.layers.double_buffer( | ||
fluid.layers.batch( | ||
data_file, batch_size=args.batch_size)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same as above.
benchmark/fluid/fluid_benchmark.py
Outdated
@@ -296,9 +331,10 @@ def train_parallel(avg_loss, infer_prog, optimizer, train_reader, test_reader, | |||
if iters == args.skip_batch_num: | |||
start_time = time.time() | |||
num_samples = 0 | |||
if iters == args.iterations: | |||
# NOTE: if use reader ops, the input data is not splited to multiple cards | |||
if args.use_reader_op and iters >= args.iterations / args.gpus: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think iters >= args.iterations / args.gpus
is appropriate.
Because the model's accuracy is highly related to the new parameters that have learned, but the new parameters may be related to the times of updating parameter. So maybe we should not do that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, args.iterations
is intended to let the benchmark finish fast, no concerns for model accuracy. To run a full model training, we can set args.iterations
to -1 so that it can run until all train data have been fed.
… fluid_benchmark_support_recordioreader
benchmark/fluid/fluid_benchmark.py
Outdated
@@ -266,7 +266,10 @@ def train(avg_loss, infer_prog, optimizer, train_reader, test_reader, batch_acc, | |||
# FIXME(wuyi): For use_reader_op, if the current | |||
# pass is not the last, the last batch of this pass | |||
# is also equal to args.batch_size. | |||
num_samples += len(args.batch_size) | |||
if args.use_reader_op: | |||
num_samples += args.batch_size |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
args.batch_size
is the batch size on each GPU now. So it should be num_samples += args.batch_size * args.gpus
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks very much! Done. Current know issue, if set --use_reader_op
we must also set --no_test
will fix this in next PR.
… fluid_benchmark_support_recordioreader
This can also fix the issue when running with
--gpus > 1